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ABSTRACT 

The present study involved the Lasting of two common 
multiple-choice item writing rules. A recent review of research 
revealed that much of the advice given for writing multiple-choice 
test items is t>ased on experience and wisdom rather than on empirical 
research. The rules assesseG in this study include: (1) the phrasing 
of the stem in the form of a question versus a partial sentence; and 
(2) the use of the inclusive "none of the above" option instead of a 
speci'*.c content option. Limited empirical research suggests that 
using ihe partial sentence format and the inclusive "none of the 
above" option may \ead to undesirable item and test characteristics, 
while tjxtbook authors essentially are divided on their opinions 
about t validity of each rule. The items used in this study were 
from the instructor's manual for D. Myer's (1986) text entitled 
"Psychology." Items were randomly assigned to be rewritten to reflect 
the experimental conditions under investigation. Two instructors of 
an introductory psychology course selected 32 multiple-choice items 
for the study. The rewritten tests wer^ administered to 228 students 
enrolled in two sections of an introductory psychology class. About 
half of the students in each section received Form A and the other 
hair received Form B, resulting in 115 Form A and 113 Form B 
responses. The same manipulated items were combined with 18 different 
non-manipulated items in a third section of the class to comprise 
Forms C and D, whose administration resulted in 59 Form C and 59 Form 
D responses. Results offer no evidence to support the use of either 
type of stem and limited evidence to caution against use of the "none 
of the above" option. Two data tables and examples of the four item 
formats used are provided. (TJH) 
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The validity o£ ?wo Ites-Writlng Rules 

A rerent revlev of research revealed that »uch of the advice 
given for writing aultiple-choice test items is based on 
experience and wisdom rather than ear rical research. The 
present study involved the testing of two comac^ item writing 
rules: (1) the phrasing of the stem in the torm of a question 
versus a partial sentence and (2) the use of the inclusive "none 
of the above" option instead of a specific content option. 
Limited empirical research suggests that using the partial 
sentence format and the inclusive *none of these* option may lead 
tc undesirable item and test characteristics, while textbook 
authors essentially are divided on their opinions about the 
validity of each ri^le. Results of this experimental study offer 
no evidence to support the use of either type of stem and limited 
evidence to caution against use the option ''none of the above." 
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The Validity of Two Ite«-irr itlnq Rules 
A nuttber o£ vrlters In the field of educational aeasurement 
have conaented that aultir-le-choice (MC) itea writing/ despite 
its widespread popularity and use^ has received little scholarly 
attention in the past (Cronbach, 1970; Bbel, 1951; Millaan & 
Green, in press; Kitko, 1984; Roic? and Haladyna, 1982; Wesaan, 
1971; Wood/ 1977). In a review of empirical research nn itea 
writing, Haladyna cind Downing (1989a) reported finaing 96 
empirical studies of which 53 dealt with only two item-writing 
practices, the opti>ial number of options and the desirability of 
key balancing. Most item-writing rules have been studied fewer 
than 10 times. Thus the empirical foundation for the validity of 
many item-writing rules is weak, and the basis for many rules is 
often authoritative wisdom passed on through textbooks and other 
professional publications and presentations. 

The study reported here addresses two item-writing rules 
which are popularly prescribed in treatments on MC item writing 
in textbooks and other sources in the educational measurement 
literature (Haladyna & Downing, 1989b). The first rule is: 
"Don't use *none of the above* as an option*^; the second rule is 
"Use either the question format or the completion format when 
phrasing the stem." 

In a review of 46 references dealing with the topic MC 
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these references stated support or lack of support for the ••Don't 
use *none of the above* as an cption" rule. This vas the tenth 
sost often sentioned rule, and this sur^'ey vas taken as evidence 
of the importance of the rule for ite& writers. Hovever, authors 
vere divided on their support for this rule, vith 19 for and 1^ 
against. Obviously soae controversy exists in the validity of 
the rule* 

Empirical research on this iter vriting rule has b 3en 
limited to only ten studies (Soynv^n, 1950; Dudycha & Carpen>:er, 
1973; Forsyth & Spratt, 1980; Hughes & Trimble, 1965; Mueller, 
1975; Oosterhof & Coats, 1984; Rimland, 1960; Schmeiser & 
Whitney, 1975; Wesman a Bennett, 1946; Williamson & Hopkins, 
1967). All of these studies involved the item characteristic of 
difficulty, but only five studied item discrimination and 
reliability, and only tvo validity. In all instances, the use of 
**none of the above** option made items more difficult, the mean 
affect across nine studies vhere results vere aggregable vas 
4*6%. With discrimination, avoiding the inclusive "none of the 
above** option made items slightly more discriminating, .03, vhile 
reliability vas improved by a factor of •04* 
iluggtion fpgiat Yggsus CompUtion Fotnat 

One of the most fundamental requirements in MC item 
vriting is that one states the item in a question format or a 
completion format* On the surface there appears to be no reason 
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(1989a),, the r^-le is one of th*» »08t co«»on given in treataents 
on MC item writing, 41 of 46 references aentioned it, and all 41 
support the we of either foraat. Paradoxically, the snail body 
of eaplrlcal research leads to the opposite conclusion. 

Studies of tMs Itea writing rule include: Board and 
Whitney (1972), Dudycha & Carpenter (1973), Dunn & Goldstein 
(1959), 3ch«eiser « Whitney (197ba; 1975b), and Schrock & Mueller 
(1982). These six studies observed effects on lte« difficulty In 
each Instance, dlscri«ination in three cases, reliability four 
ti»es, and validity twice. In general, the question foraat 
-ppears to have an advantage over the sentence completion format 
with respect to making Items slightly easier, having little or no 
eifect on Item discrimination, and making test scores based on 
such Items more reliable and valid. For reliability, the 
improvement was a median .065, which is a reduction of 6.5% error 
variance in test scores. Validity was Improved by .06 In two 
studies (Board « Whitney, 1972; Schmelser & Whlcney, 1975b). 
Based on these few studies. It appears the evidence favors the 
use of the question format over the completion format in ph.aslng 
the KC stem. 

The present study further investigates these two item- 
writing rules. 
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The iteM used in this study vere froa the instructor's 
aanual for Myer's (1986) text entitled Psychology , Two 
instructors of an introductory f;wychology course selected 32 MC 
iteas for the study. Bach lte» vas keyed to the objectives of 
the course and met the standard requirements for MC itea vriting. 
Bach ite« also had adequate performance characteristics as judged 
froM previous uses. Items vere randomly assigned to be rewritten 
to reflect the expe^rimental manipulations as outlined belov: 



Mo. of 
8 
8 
8 



Version 1 

completion 
option 'e' (CB) 

question 
option »e» (QB) 
question 
none of these (QU) 
completion 



Version 2 

completion 
none of these (CH) 
completion 
option »e* (CE) 
question 
option »e» (QE) 
question 
none of these (QN) 



none of these (CM) 
Figure 1 provides an example of one item vritten in all four 
variations . 



Insert Figure Inabout here 
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The sanipulations vere balanced both vithin and between the tvo 
versions • Version 1 items vere conbined %ith eighteen non- 
manipulated items to comprise Form A of the final exam £or tvo 
sections of an introductory psychology class ^hile Version 2 
items vere combined vith the same eighteen items to comprise Form 

Test forms vere key balanced vith the option 'none of these* 
being keyed three times in sixteen appearances or approximately 
one-fifth of the tine. 

The tests vere admlnietered to tvo sections of the class 
vith approximately one-half the students in each section 
receiving Form A and the other half receiving Form B rc^.sulting in 
115 Form A and 113 Form B responses. In addition, the same 
manipulated items vere combined vith eighteen different non- 
manipulated items in a third section of the class to comprise 
Forms C and D. Forms vere key balanced as above and test 
administration in this class resulted in 59 Form C and 59 Form D 
responses. 

This design vas chosen to allov comparison of item format 
manipulations controlling for examinee ability. That is, when 
Version 1 CB items are combined vith Version 2 QE items, ve have 
sixteen items not employing the option 'none of these*. When 
Version 1 QN items are combined vith Version 2 CN items ve have 
these same sixteen items employing the option 'none of these*. 
Item characteristics can be compared betveen these sixteen item 
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other of the eight ite» subscales under each condition. Since, 
at best, saall effect sizes were anticipated hypothesis testing 
vas conducted vith alpha set at the «19 level for each 
statistical test* 

Table 1 presents the means and standard deviations of Itea 
difficulties. Bean polnt-biserials and the Kuder-Richardson 20 
reliability estistatss of each subscale for the four forms of the 
test. 



Insert table 1 about here 



In order to test for differences in difficulty and 
discrimination for the question versus completion format it^m 
statistics for the Form A-OB items vere combined vith item 
statistics for the Form B-QM items and vere compared to the Form 
A-CN items combined with the Form B-CB items. Similarly item 
statistics for the same item types on Forms C and D vere 
combined. In order to test for differences in difficulty and 
discrimination for the inclusive versus specific option 
hypothesis item statistics for Form A~CB items vere combined vith 
Form B-QB and vere compared to the Form A-QH items combined vith 
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Foka @-CH it««s. 3i»il4Kly^ itea statistics for the sane Itea 
types vere coablned on Foras C and 0. Sxiam&xy statistics £or the 
combined itess are presented in Table 2. 



Insert Table 2 about here 



DIFFICULTY 

The observed difference In difficulty vas .02 higher for the 
question format. A correlated one-tailed t-test shoved non 
significance at the .10 Irvel (t « .56, df = 15, r « .70, p « 
.29). The t-test for the same comparison on Forms C and D shoved 
similar results vlth a mean difference of .003 and a non- 
significant t statistic (t«.10, df = 15, r » .76, p = .46). 
Differences betveen using and not using the option 'none of 
these' vas tested by combining Form k CS vlth Form B QB item 
difficulties and comparing these vith Form A Q and Form 8 C item 
difficulties. The difference In mean difficulty vas .027 vith 
use of 'none of these' being lover. The dependent t-test vas 
significant at the .1 level (t = 1.44, df » 15, r » .916, p « 
.0«5). The same test for Forms C and D had similar r'^sults vlth 
a mean difference of .043 (t » 1.59, df » 15, r » .67, p » .065). 

PfgCRIMIWaTIOif 

Differences in mean point-blserlals between the question and 
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co«pl#t4nn formats -are nen-signif icasnt for both replleatioss. 
Differences In aean polnt-blserUla between using and not asing 
the Inclusive 'none o£ these' option were .034 and .033 for Pora 
A vs Por« B and For» C vs Por« D respectively and favored not 
ttaln9 the inc uslve option in both Instances, The observed 
differences, however, failed to reach significance at the .10 
level. The correlated t-tests for Forn A versus Pora B and ?ori 
C versus Fora D had p values of .18 and .20 respectively. 



Vhile this study fails to offer support to a recoaaendation 
regarding use of either the question or coapletlon format over 
the other, observed results regarding use of the "none of these" 
option are consistent with previous findings in direction and 
aagnitude. Differences in difficulty were statistically 
significant and in 3 to 4% range favoring the specific option 
over the inclusive option foraat. itea discriminations were also 
observed to be slightly over .033 higher for the specific option 
foraat. This result, while not statistically significant, is at 
the saae level as observed in previous research. Lack of 
statistical significance aay be attributable to the low power to 
detect a difference of this aagnitude with sixteen subjects 
(iteas) and the low correlations between the Itea discriminations 
between fotas (.183, .488). It is noted that differences in Itea 
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differences in reliability of about .04 favoring use of the 
specific option over use of "none of these". Future research on 
this should Ode the knovledge of this effect size to deteraine 
the saaple size necessary to detect a .03 or greater effect with 
reasonable pover. 
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NtMM and t<»»inrd ««^l«tlMMi for St«« 41ffte«Ui«8 cadh - • 
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.393 
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•Reliability estiaate b«sed on average point-biseri^ls £or 
sixteen iteas after Gailfoxd (1965). 
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(CM) In their classic nln^-year study, rrledMn and Rosenan fcmnd 
that coapatitive/ hard-driving, impatient, and easily 
angarad individuals axe especially susceptible tor 

e. stoMCh ulcers, 
fo. cancer. 
» c. heart attacks. 

d. accidents. 

e. none o€ these 

(QW) In their classic nine-year s'tody, Friedman and Roseaan 

found that coapetitive, hard-driving, iapatlent, and 
easily angered individuals are especially susceptible 
irtilch o£ the Cdlloving? 

a. stoaach ulcers 

b. cancer 

c. strokes 

d. accidents 

* e. none of these 

(CK) In their classic nine-year study, rriedaan and Roseaan found 
that coapetitive, hard-driving, iapatlent, and easily 
angered individuals are especially susceptible to: 

a. stoaach ulcers, 
b* cancer. 
* c. heart attacks, 
d. accidents. 
e« strokes. 

(QS) In their classic nine'^year study, Friedaan and Roseaan found 
that coapetitive, harder Iving, iapatlent, and easily 
angered individuals are especially susceptible to vhich of 
the folloving? 

a. stoaach ulcers 

b. cancer 

* c. heart attacks 

d. accidents 
a. strokes 



